Algorithm for sequence mining using gap constraints
نویسنده
چکیده
The sequence mining problem consists in finding frequent sequential patterns in a database of timestamped events. Some application domains require limiting the maximum temporal gap between events in the input sequences. However concentration on such constraint is critical for most sequence mining algorithms. In this paper we describe CCSM (Cache-based Con-strained Sequence Miner), a new level-wise algorithm that overcomes the troubles usually related to this kind of con-straints. CCSM adopts an approach based on k-way intersections of idlists to compute the support of candidate sequences. Our k-way intersection method is enhanced by the use of an effective cache that stores intermediate idlists for reuse in future. The reuse of intermediate results involves a surprising reduction in the actual number of join operations performed on idlists. CCSM has been experimentally compared with cSPADE, a state of the art algorithm, on several synthetically generated datasets, achieve better or similar results in most cases.
منابع مشابه
cSPADE -UE: Algorithm for Sequence Mining for Unstructured Elements Using Time Gap Constraints
-We present a new state machine that combines two techniques for complex data sequences: Data modeling and frequent sequence mining. This algorithm relies on unstructured variable gap sequence miner, to mine frequent patterns with different gap between elements. Here we will have two variations: Sequence pruning technique for other primary frequent sequences to reduce space complexity and allow...
متن کاملNOSEP: Nonoverlapping Sequence Pattern Mining With Gap Constraints.
Sequence pattern mining aims to discover frequent subsequences as patterns in a single sequence or a sequence database. By combining gap constraints (or flexible wildcards), users can specify special characteristics of the patterns and discover meaningful subsequences suitable for their own application domains, such as finding gene transcription sites from DNA sequences or discovering patterns ...
متن کاملEfficiently Mining Closed Subsequences with Gap Constraints
Mining frequent subsequence patterns from sequence databases is a typical data mining problem and various efficient sequential pattern mining algorithms have been proposed. In many problem domains (e.g, biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this paper we re-examine the closed sequential patter...
متن کاملGeneralization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints
The problem of sequential pattern mining is one of the several that has deserved particular attention on the general area of data mining. Despite the important developments in the last years, the best algorithm in the area (PrefixSpan) does not deal with gap constraints and consequently doesn't allow for the introduction of background knowledge into the process. In this paper we present the gen...
متن کاملProtein Sequence Pattern Mining with Constraints
Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014